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ABSTRACT 

In this paper we present a supervised neural network approach to the deter- 
mination of photometric redshifts. The method, even though of general valid- 
ity, was fine tuned to match the characteristics of the Sloan Digital Sky Survey 
(SDSS) and as base of 'a priori' knowledge, it exploits the rich wealth of spectro- 
scopic redshifts provided by this unique survey. In order to train, validate and 
test the networks, we used two galaxy samples drawn from the SDSS spectro- 
scopic dataset, namely: the general galaxy sample (GG) and the luminous red 
galaxies subsample (LRG). Due to the uneven distribution of measured redshifts 
in the SDSS spectroscopic subsample, the method consists of a two steps ap- 
proach. In the first step, objects are classified in nearby (z < 0.25) and distant 
(0.25 < z < 0.50), with an accuracy estimated in 97.52%. In the second step two 
different networks are separately trained on objects belonging to the two red- 
shift ranges. Using a standard Multi Layer Perceptron operated in a Bayesian 
framework, the optimal architectures were found to require 1 hidden layer of 24 
(24) and 24 (25) neurons for the GG (LRG) sample. The presence of systematic 
deviations was then corrected by interpolating the resulting redshifts. 
The final results on the GG dataset give a robust o z ~ 0.0208 over the redshift 
range [0.01, 0.48] and a z ~ 0.0197 and a z ~ 0.0238 for the nearby and distant 
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samples respectively. For the LRG subsample we find instead a robust a z ~ 
0.0164 over the whole range, and o z ~ 0.0160, o z ~ 0.0183 for the nearby and 
distant samples respectively. After training, the networks have been applied to all 
objects in the SDSS Table GALAXY matching the same selection criteria adopted 
to build the base of knowledge, and photometric redshifts for ca. 30 million galax- 
ies having z < 0.5 were derived. A second catalogue containing photometric red- 
shifts for the LRG subsample was also produced. Both catalogues can be down- 



loaded at the URL: http://people.na.infn.it~astroneural/SDSSredshifts.htm 



Subject headings: Galaxies: photometric redshifts; Cosmology: large scale struc- 
ture 



1. Introduction 



After the pioneristic work by the Belgian astronomer Vandererkhoven, who in the late 
thirties used prism-objective spectra to derive redshift estimates from the continuu m shape 
and its macroscopic features (notably the Balmer break at ~ 4000 A). iBauml ( 1962 ) was the 
first to test experimentally the idea that redshift could be obtained from multiband aperture 
photometry by sampling at different wavelengths the galaxy spectral energy distribution 
(hereafter SED). After a period of rela tive lack of inte rest, the 'photometric redshifts' tech- 
nique was resurrected in the eighties (IButchinsI Il98ll ). when it became clear that it could 
prove useful in two similar but methodologically very different fields of application: 

i) as a method to evaluate distances when spectroscopic estimates become impossible due 
to either poor signal-to-noise ratio or to instrumental syst ematics, or to the fac t that the 
objects under study are beyond the spectroscopic limit (cf. iBolzonella et al.ll2002l ); 



ii) as an economical way to obtain, at a relatively low price in terms of observing and 
computing time, redshift estimates for large samples of objects. 

The latter field of application has been widely explored in the last few years, when the 
huge data wealth produced by a new generation of digital surveys, consisting in accurate 
multiband photometric data for tens and even hundreds of millions of extragalactic objects, 
has become available. Photometric redshifts are of much lower accuracy then spectroscopic 
ones but even so, if available in large number and for statistically well controlled samples of 
objects, they still provide a powerful tool to derive a 3-D map of the universe. A map which 
is crucial for a v ariety of applications among which we shall quote just a few: to study large 
scale structure (IBrodwin et al.ll2006l ); to co nstrain the cosmolog ical constants and model s 



flBlake fc Bridle 



2005 



and references therein, Budavari et al. 2003 



and lTegmark et al.ll2006l ); 
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to map matter distribution using weak lensing (jEdmonson et al.ll2003l and references therein). 



In this paper we present a new application of neural networks to the problem of photometric 
redshift determination and use the method to produce two catalogues of photometric red- 
shifts: one for ~ 30 million objects extracted form the SDSS-DR5 main GALAXY dataset 
and a second one for a Luminous Red Galaxies sample. 

The paper is structured as it follows. In the Sections [2] and [3J we shortly summarize the 
various methods for the determination of photometric redshifts, and the theory behind the 
adopted model of neural network. In §01 we describe both the photometric data set extracted 
from the SDSS and the base of knowledge used for the training and test and, in § |5]we discuss 
the method and present the results of the experiments. It needs to be stressed that even 
though finely tailored to the characteristics of the SDSS data, the method is general and 
can be easily applied to any other set provided that a large enough base of knowledge is 
available. 

As stressed by several authors, photometric redshift samples are useful if the structure of 
the errors is well understood; in §[7] we therefore present a discussion of both systematic and 
random errors and propose a possible strategy to correct for systematic errors (§ E}. In § M 
we shortly describe the two catalogues. Finally, in § [9], we discuss the results and present 
our conclusions. 



This paper is the first in a series of three. In the second one (IBrescia et al.ll2006l ) we shall 
present the catalogue of structures extracted in the nearby sample using an unsupervised 
clustering alg orithm working on the t hree dimensional data set produced from the SDSS data. 
In paper III ( iD'Abrusco et al.l 120061 ) we shall complement the information contained in the 
above quoted catalogues by discussing the statistical clustering of objects in the photometric 
parameter space. 



2. Photometric redshifts 



Without entering into too much detail, photometric redshifts methods can be broadly 
grouped in a few families: template fitting, hybrid and empirical methods. 

Template fitting methods are based on fitting a library of template Spectral Energy Distri- 
butions (SEDs) to the observed data, and differ mainly in how these SEDs are derived and in 
how they are fitted to the data. SEDs may either be derived from pop ulation synthesis mod - 
els ( Bruzual A. fc Chariot 1993 ) or from the spectra of real objects ( Coleman et al. 198ol ) 
carefully selected in order to ensure a sufficient coverage of the parameter space (mainly 
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in terms of morphological types and/or luminosity classes). Both approaches (synt hetic 



and e mpirical) have had their pro's and con's w i dely discussed in t he literature, (cf. iKoo 



( 1999 ). bu t see also Fernandez-Soto et al. ( 2001 ). Massarotti et al. (j2001 ) Massarotti et al. 



( )200ll ) and lCsabai et al.l (120031 )). Synthetic spectra, for instance, sample an 'a priori' defined 
grid of mixtures of stellar populations and may either include unrealistic combinations of 
parameters, or exclude some unknown cases. On the other end, empirical templates are 
necessarily derived from nearby and bright galaxies and may therefore be not representative 
of the spectral properties of galaxies falling in other redshift or luminosity ranges. Ongoing 
attempts to derive a very large and fairly exhaustive set of empirical templates using the 
SDSS spectroscopic dataset are in progress and will surely prove useful in a nearby future. 

Hybrid SED fitting methods making use of a combination of both observed and theoretically 



predicted SEDs have been propo sed with mixed results by several authors (IBolzonella et al. 
20001 : IPadmanabhan et aDbood ). 



The last family of methods, id est the empirical ones, can be applied only to 'mixed surveys', 
id est to datasets where accurate and multiband photometric data for a large number of ob- 
jects are supplemented by spectroscopic redshift s for a smaller but still significant subsample 
of the same objects. These spectroscopic data are used to constrain the fit of an interpo- 
lating function mapping the photometric parameter space and differ mainly in the way such 



interpolation is perform ed. As it has been pointed out by many authors (jConnolly et al. 



19951 ; ICsabai et al.ll2003l ). in these methods the main uncertainty comes from the fact that 
the fitting function is just an approximation of the complex relation existing between the 
colors and the redshift of a galaxy and by the fact that as soon as the redshift range and/or 
the size of the parameter space increase, a single interpolating function is bound to fail. 
Attempts to o v ercom e this problem have been proposed by several authors. For instance, 
Brunner et al.l (119991 ). divi ded the redsh i ft and color range in several intervals in order to 
optimize the interpolation. ICsabai et al.l (120031 ) used instead an improved nearest neighbor 
method consisting in finding, for each galaxy in the photometric sample, the galaxy in the 
training set which has the smallest distance in the parameter space and then attributing the 
same redshift to the two objects. 

More recently, several attempts to interpolate the a priori knowledge provided by the spec- 
troscopic redshif ts have been made using statistical pattern recognition techn iques such as 

IgOod : IVanzella et alJlioollFirth et al.ll2003h and Support 



neural networks ( 



aglia ferri et al. 



Vector Machines (jWadadekarl 120051 ) . with results which will be discussed more in detail in 
what follows. 



It has to be stressed that since the base of knowledge is purely empirical (i.e. spectroscop- 
ically measured redshifts), these methods cannot be effectively applied to objects fainter 
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than the spectroscopic limit. To partially overcome this problem, noticeable attempts have 
been made to build a 'synthetic' base of knowledge using spectral synthesis models, but it is 
apparent that, in this case, the uncertainties of the SED fitting and empirical methods add 
up. 

In any case, it is by now well established that when a significant base of knowledge is 
available, empirical methods outperform template fitting ones and that the use of the latter 
should be confined to those case where a suitable base of knowledge is missing. 



3. The Multi Layer Perceptron 



Neural Networks (hereafter NNs) have long been known to be excellent tools for in- 
terpolating data and for extracting patterns and trends and since few years they have also 
digged their way into the astr onomical community for a variety of applications (see the re- 
views dTaeliaferri et al.ll2003al fbh and re ferences therein) ra nging from star-galaxy separation 
(iDona. Iekll2007h. spectral classification (jWinter et al.l 12004 ) and photometric redshifts evalu- 
ation ( Tagliaferri et al. 20021 ; iFirth etaL 2003 ). In practice a neural network is a tool which 
takes a set of input values (input neurons), applies a non-linear (and unknown) transforma- 
tion and returns an output. The optimization of the output is performed by using a set of 
examples for which the output value is known a priori. NNs exist in many different models 
and architectures but since the relatively low complexity of astronomical data does not pose 
special constrains to any step of the method which will be discussed below we used a very 
simple neural model known as Multi-Layer Perceptron or MLP which is probably the most 
widely used architecture for practical applications of neural networks. 

In most cases an MLP consists of two layers of adaptive weights with full connectivity between 
inputs and intermediate (namely, hidden) units, and between hidden units and outputs (see 
Fig. CD . 
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Note, however, that an alternative convention is sometimes also found in literature which 
counts layers of units rather than layers of weights, and regards the input as separate units. 
According to this convention the network showed in Fig. [T] would be called three-layer net- 
work. However, since the layers of adaptive weights are those which really matter in deter- 
mining the properties of the network function, we refer to the former convention. 



3.1. MLP: the flux of the computation 

The MLP realizes a complex nonlinear mapping from the input to the output space. Let 
us denote the N input values to the network by x = {xi,x 2 , ■ ■ ■ , xj}- The first layer of the 
network forms a linear combinations of these inputs to give a set of intermediate activation 
variables a? 



8=1 



with one variable ap associated with each of the M hidden units. Here represents 
the elements of the first-layer weight matrix and op are the biases parameters associated 
with the hidden units. The variables ap are then transformed by the nonlinear activation 
functions of the hidden layer. Here we restrict attention to tanh activation functions. The 
outputs of the hidden units are then given by 

Zj = t&nh(aP),j = 1, . . . , M (2) 

The Zj are then transformed by the second layer of weights and biases to give the second-layer 



activation values a k 



(2) 



M 

a ( P = J2 w< S^ + b k\k = l,...,c (3) 

3=1 

where c is the number of output units. Finally, these values are passed through the output- 
unit activation function to give output values y^ where k = 1 . . . , c. Depending on the 
nature of the problem under consideration we have: 

(2) 

• for regression problems: a linear activation function, i.e. — a k ; 

• for classification problems: a logistic sigmoidal activation functions applied to each of 
the output independently, i.e.: 

1 

Vk 



1 + exp(-4 2) ; 



- 8- 



3.2. MLP Training Phase 



The basic learning algorithm for MLPs is the so called backpropagation and is based 
on the error- correct ion learning rule. In essence, backpropagation consists of two passes 
through the different layers of the network: a forward pass and a backward pass. In the 
forward pass an input vector is applied to the input nodes of the network, and its effect 
propagates through the network layer by layer. Finally, a set of outputs is produced as the 
actual response of the network. During the backward pass, on the other hand, the weights 
are all adjusted in accordance with the error-correction rule. Specifically, the actual response 
of the network is subtracted from a desired (target) response (which we denote as a vector 
t = {ti, i 2 , ■ ■ • , t c }) to produce an error signal. This error signal is then propagated backward 
through the network. There are several choices for the form of the error signal to produce 
and this choice still depends on the nature of the problem, in particular: 

• for regression problems we adopted the sum-of-squares error function: 

N c 
n=l fc=l 

• for classification problems we used the cross-entropy error function: 

c 

£ = £ra ln v* + - **) Mi - v*)}- 

n k=l 



The weights are adjusted to make the actual response of the network move closer to the de- 
sired response in a statistical sense. In this work we adopted a computational more efficient 
variant of the backpropagation algorithm, namely the quasi-newton method. Furthermore, 
we employed a weight-decay regularization technique in order to limit the effect of the over- 
fitting of the neural model to the training data, therefore the form of the error function 
is: 



E = E + u 



w, 



where the sum runs over all the weight and biases. The v controls the extents to which the 
penalty term | ^\ wf influences the form of the solution. 

It must be stressed that the universal approximation theorem (jHaykinlll999l ) states that the 
two layers architecture is capable of universal approximation and a considerable number of 
papers have appeared in the literature discussing this property (cf. iBishopl (119951 ) and refer- 
ence therein). An important corollary of this result is that, in the context of a classification 



- 9- 



problem, networks with sigmoidal nonlinearities and two layer of weights can approximate 
any decision boundary to arbitrary accuracy. Thus, such networks also provide universal 
non-linear discriminant functions. More generally, the capability of such networks to ap- 
proximate general smooth functions allows them to model posterior probabilities of class 
membership. Since two layers of weig hts suffice to imp lement any arbitrary function, one 
would need special problem conditions (IDuda et al.lll973l ) or requirements to recommend the 
use of more than two layers. Furthermore, it is found empirically that networks with multiple 
hidden layers are more prone to getting caught in undesirable local minima. Astronomical 
data do not seem to require such level of complexity and therefore it is enough to use just a 
double weights layer, i.e a single hidden layer. 

As it was just mentioned, it is also possible to train NNs in a Bayesian framework, which 
allows to find the more efficient among a population o f NNs differing in the hyperparameters 
controlling the learning of the network (IBishopl 119951 ). in the number of hidden nodes, etc. 
The most important hyperparameters being the so called a and (3. a is related to the weights 
of the network and allows to estimate the relative importance of the different inputs and 
the selection of the input paramet e rs wh ich are more relevant to a given task [Automatic 
Relevance Determination] iBishopI ( 119951 ) ). In fact, a larger value for a component of a 
implies a less meaningful corresponding weight. /3 is instead related to the variance of the 
noise (a smaller value corresponding to a larger value of the noise) and therefore to a lower 
reliability of the network. The implementation of a Bayesian framework requires several 
steps: initialization of weights and hyperparameters; training the network via a non linear 
optimization algorithm in order to minimize the total error function. Every few cycles of the 
algorithm, the hyperparameters are re-estimated and eventually the cycles are reiterated. 



4. The data and the 'base of knowledge' 

The Sloan Digital Sky Survey (hereafter SDSS) is an ongoing survey to image approxi- 
mately 7r sterad of the sky in five photometric bands (u, g,r,i, z) and it is also the only survey 
so far to be complemented by spectroscopic data for ~ 10 6 objects (cf. the SDSS webpages at 



http://www.sdss.org/ for further details). The existence of such spectroscopic subset (here- 
after SpS), together with the accurate characterization of biases and errors renders the SDSS 
an unique and ideal playing ground on which to train and test most photometric redshifts 
methods. 



Sever al criteria may be adopted in extracting galaxy data from the SDSS database (jYasuda et al 



20011 ). We preferred, however, to adopt the standard SDSS criterium and use the GALAXY 



table membership. The data used in this work were therefore extracted from the SDSS cata- 
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logues. More in particular, the spectroscopic subsample (hereafter SpS), us ed for training and 

testin g purposes, was extracted from the Data Release 4 (hereafter DR4; cf. lAdelman-McCarthy et al 
( 120061 )) . While this work was in progress the Data Release 5 (DR5) was made publicly avail- 
able. Thus, the photometric data used to produce the final catalogues were derived from the 
latter data. We wish to stress that this extension of the dataset was made possible by the 
fact that the properties of the DR5 are the same of the DR4 except for a wider sky coverage. 

In this paper we made use of two different bases of knowledge extracted from the SpS of the 
DR4: 



The General Galaxy Sample or GG: composed of 445, 933 objects with z < 0.5 matching 
the following selection criteria: dereddened magnitude in r band, r < 21; mode = 1 
which corresponds to primary objects only in the case of deblended sources. 

The Luminous Red Galaxies sample or LRG: composed of 97, 475 red luminous galaxies 
candidates having spectroscopic redshift < 0.5. 



The SDSS spectroscopic survey (jEisenstein et al.ll200ll ) was planned in order to favour 
the observation of the so called Red Luminous Galaxies or LRGs which are expected 
to represent a more homogeneous population of luminou s elliptical galaxies whi ch can 
be effectively used to trace the large scale structures (jEisenstein et all l200ll ). We 
therefore extracted from the SDSS-DR4 all objects matching the above listed criteria 
and, furthermore, flagged as primTarget — T ARGET JGALAXY _RED' . 



LRGs are of high cosmological relevance since they are both very luminous (and therefore 
allow to map the universe out to large distances), and clearly related to the cosmic structures 
(being preferably found in clusters). Furthermore, their spectral energy distribution is rather 
uniform, wi th a strong break at 4000 A produced by th e superposition of a large number of 
metal lines (ISchneider et al.lll983l ; lEisenstein et al.ll2003l ). LRGs are theref ore an idea l targe t 
to test the validity of photometric redshift al g orithms (see for in stan ce: iHamiltonl (119 85') . 



Gladders & Yeei (120001 ). lEisenstein et all (120011 ) . IWillis et all (I200lh and lPadmanabhan et al. 



( 120051 )). Th e selection of LRG objects was performed using the same criteria extensively 
described in iPadmanabhan et all (120051 ) and, given the rather lengthy procedure, we refer 
to that paper for a detailed description of the cuts introduced in the parameter space. 

Since it is well known that photometric redshift estimates depend on the morphological type, 
age, metallicity, dust, etc. it has to be expected that if some morphological parameters are 
taken into account besides than magnitudes or colors alone, estimates of photometric red 



shifts s hould become more ac curate. Such an effect was for instance found by lTagliaferri et al. 
(booj l: IVanzella et all (bo04h . 
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In order to be conservative and also because it is not always simple to understand which 
parameters might carry relevant information, for each object we extracted from the SDSS 
database not only the photometric data but also the additional parameters listed in Table [TJ 

These parameters are of two types: those which we call 'features' (marked as F in Tabled]), 
are parameters which potentially may carry some useful information capable to improve the 
accuracy of photometric redshifts, while those named 'labels' (marked as L) can be used to 
better understand the biases and the characteristics of the 'base of knowledge'. 

For what magnitudes are concerned, and at a difference with other groups who used the 
modelMag, we used the so called dereddened magnitudes (dered), corrected for the best 
available estimate of the SDSS photometric zero-points: 

A(u,g,r,i,z) = (-0.042,0.036,0.015,0.013,-0.002) 



as reported in iPadmanabhan et al.l (120051 ) . It has to be stressed, however that such correc- 
tions are of little relevance for empirical methods since they affect equally all data sets. 
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Table 1. List of the parameters extracted from the SDSS database and used in the 

experiments. 



N 



Parameter 



F/L 



1 

2 
3 
4 
5 
6 
z 



objID 

ra 
dec 

petroR50i 

petroR90i 

deredj 

lnLDeV r 

lnLExp r 

lnLStar r 

spectroscopic redshift 



SDSS identification code 
right ascention (J2000) 
declination (J2000) 

50 % of Petr. rad. in the i-th band, i — u,g, r, i, z 
90 % of Petr. rad. in the i-th band, i = u,g, r, i, z 
dered. mag. in the i — th band, i = u, g,r,i, z 
log likelihood for De Vaucouleurs profile, r band 
log likelihood for exponential profile, r band 
log likelihood for PSF profile, r band 



F 
F 
F 
F 
F 
F 
L 



specClass spectral classification index 



Note. — Column 1: running number for features only. Column 2: SDSS code. Column 3: short 
explanation. Column 4: type of parameter, either feature (F) or label (L). 
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Finally we must stress that we impose the condition that the objects had to be 'primary' 
(mode = 1) and detected in all five bands. The latter condition being required by the fact 
that all empirical methods suffer, one way or the other, from the presence of missing data 
and, to our knowledge, no clear cut method has been found to overcome this problem. 



4.1. Features selection 



In order to evaluate the significance of the ad ditional features, our fir st set of experiments 
was performed along the same line as described in lTagliaferri et al.l (120021 ) using a Multi Layer 
Perceptron with 1 hidden layer and 24 neurons. In each experiment, the training, validation 
and test sets were constructed by randomly extracting from the overall dataset three subsets, 
respectively containing 60%, 20% and 20% of the total amount of galaxies. 

On the sample, we run a total of N + 1 experiments. The first one was performed using all 
features, while the other N were performed taking a way the i — th feature with i = 1, N. 
For each experiment, following ICsabai et al.l (120031 ). we used the test set to evaluate the 
robust variance 03 obtained by excluding all points whose dispersion is larger then 3a (see 
§ [7j) . The values are listed in Table [2j 
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Table 2. Results of the feature significance estimation. 



Parameters 


C3 


all 


0.0202 


all but 1 


0.0209 


all but 2 


0.0213 


all but 4 & 5 


0.214 


all but 6 


0.215 


only magnitudes 


0.0199 



Note. — Column 1: fea- 
tures used. Features are num- 
bered as in Table [T] Column 
2: robust sigma of the residu- 
als. 
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As it can be seen, the most significant parameters are the magnitudes (or the colors). Other 
parameters affect only the third digit of the robust sigma and, due to the large increase in 
computing time during the training phase (which scales as N 2 , where N is the number of 
input features) and to avoid loss of generality of higher redshifts, where additional features 
such as the Petrosian radii are either impossible to measure or affected by large errors, we 
preferred to drop all additiona l features and use onl y the magnitudes. Th e fact that on the 
contrary of what was found in IVanzella et al.l (120041 ) and iTagliaferri et al.l (120021 ) additional 
features do not play a significant role may be understood as a consequence of the fact that 
in this work the training set is much larger and more complete than in these earlier works 
and therefore the color parameter space is (on average, but see below) better mapped. 



5. The evaluation of photometric redshifts 



One preliminary consideration: as it was first pointed out by IConnolly et al.l (119951 ). 
when working in the near and intermediate redshift universe (z < 1), the most relevant 
broad band features are the Balmer break at 4000 A and the shape of the continuum in the 
near UV. Near IR bands become relevant only at higher redshift and this is the main reason 
why we decided to concentrate on the near universe [z < 0.5), where the SDSS optical bands 
provide enough spectral coverage. 

One additional reason comes from the redshift distribution of the objects in the SpS-DR4 
shown in Fig. [2] (solid line). As it can be clearly seen, the histogram presents a clear 
discontinuity at z ~ 0.25 (86% of the objects have z < 0.25 and only 14% are at a higher 
redshift) and in practice no objects are present for z > 0.5. 
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Fig. 2. — Distribution of redshifts in the SpS sample. Solid line: GG sample. Dashed line: 
non-LRG sample. Dotted line: LRG sample (see text for details). Notice the sharp drop at 
z ~ 0.25. 
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Fig. 3. — Distribution of the objects in the GG sample versus the r magnitude. We plot the 
LRG objects as red dots. 
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In Fig. [2] we also plot as dotted line the redshift distribution of the galaxies in the SpS 
data set which match the LRG photometric selection criteria. As it can be seen, within 
the tail at z > 0.25 only a very small fraction (11.4%) of the objects does not match the 
LRG selection criteria. In Fig. [3] we plot the redshift of objects belonging to the GG sample 
against their luminosity in r band: red dots represent those galaxies which have been a 
posteriori identified as LRG. As it is clearly seen, the overall distribution at redshift < 0.25 
drops dramatically at r ~ 17.7, due to the selection criteria of the spectroscopic SDSS 
survey. At higher redshift, namely z > 025, the galaxy distribution is dominated by LRGs 
with few contaminants and extends to much fainter luminosities. Nevertheless LRGs are 
systematically brighter then GG galaxies all over the redshift interval z < 0.50. 

Such large dishomogeneity in the density and nature of training data, poses severe constraints 
on any empirical method since the different weights of samples extracted in the different 
redshift bins would lead either to over fitting in the densest region, or to the opposite effect 
in the less populated ones. Furthermore, the dominance of LRGs at z > 0.25 implies that 
in this redshift range the base of knowledge offers a poor coverage of the parameter space. 



The fi r st problem can be s olved by taking into account the fact that, as shown in lTagliaferri et al 



( 120021 ) ; lFirth et al.l (120031 ) NNs work properly even with scarcely populated training sets, and 
by building a training set which uniformly samples the parameter space or, in other words, 
which equally weights different clusters of points (notice that in this paper we use the word 
cluster in the statistical sense, id est to denote a statistically significant aggregation of points 
in the parameter space). In the present case the dominance of LRGs at high redshifts renders 
the parameter space heavily undersampled. 



In fact, as it will be shown in Paper III (ID'Abrusco et al.l 120061 ). a more detailed analysis 



of the parameter space shows that at high redshift, the objects group into one very large 
structure containing more than 90% of the data points, plus several dozens of much smaller 
clusters . 



5.1. The nearby and intermediate redshifts samples 

In order to tackle the above mentioned problems, we adopted a two steps approach: 
first we trained a network to recognize nearby (id est with z < 0.25) and distant (z > 0.25) 
objects, then we trained two separate networks to work in the two different redshift regimes. 
This approach ensures that the NNs achieve a good generalization capabilities in the nearby 
sample and leaves the biases mainly in the distant one. To perform the separation between 
nearby and distant objects, we extracted from the SDSS-4 SpS training, validation and test 
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sets weighting, respectively, 60%, 20% and 20% of the total number of objects (449,370 
galaxies). The resulting test set, therefore, consisted of 89,874 randomly extracted objects. 
Extensive testing (each experiment was done performing a separate random extraction of 
training, validation and test sets) on the network architecture lead to a MLP with 18 neurons 
in 1 hidden layer. This NN achieved the best performances after 110 epochs and the results 
are detailed, in the form of a confusion matrix, in Table [3j 

As it can be seen, this first NN is capable to separate the two classes of objects with an 
efficiency of 97.52%, with slightly better performances in the nearby sample (98.59%) and 
slightly worse in the distant one (92.47%). 
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Table 3. Confusion matrix for the "nearby-distant" test set. 





SDSS nearby 


SDSS far 


NN nearby 


76498 


1096 


NN far 


1135 


11145 
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In Fig. H] we plot against the redshift the percentage (calculated binning over the redshifts) 
for the objects in the test set which were misclassified (id est objects belonging to the nearby 
sample which were erroneously attributed to the distant one and viceversa). The distribution 
appears fairly constant from z spec ~ 0.05 to z spec ~ 0.45, while higher (but still negligible 
respect to the total number of objects in the sample) percentages are found at the extremes. 




0.0 



0.1 0.2 0.3 0.4 0.5 

^spec 



Fig. 4. — Percentage distribution of misclassified objects of GG sample normalized to the 
total number of galaxies in each redshift bin. 
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Notice that, when using photometric data alone, the absence of training data for z > 0.5, 
does not allow to evaluate the fraction of contaminants having z > 0.5 which are erroneously 
attributed to the distant sample. However, given the adopted cuts in magnitude, this number 
may be safely assumed to be negligible. 



5.2. The photometric redshifts 

Once the first network has separated the nearby and distant objects, we can proceed 
to the derivation of the photometric redshifts working separately in the two regimes. Since 
NNs are excellent at interpolating data but very poor in extrapolating them, in order to 
minimize the systematic errors at the extremes of the training redshift ranges we adopted 
the following procedure. 

For the nearby sample we trained the network using objects with spectroscopic redshift in 
the range [0.0,0.27] and then considered the results to be reliable in the range [0.01,0.25]. 
In the distant sample, instead, we trained the network over the range [0.23, 0.50] and then 
considered the results to be reliable in the range [0.25, 0.48]. 

In order to select the optimal NN architecture, extensive testing was made varying the 
network parameters and for each test the training, validation and test sets were randomly 
extracted from the SpS. The results of the Bayesian learning of the NNs were found to depend 
on the number of neurons in the hidden layer; for the GG (LRG) sample the performances 
were best when this parameter was set to 24 for the nearby sample and for the distant one 
(24 and 25 respectively for the LRG sample). In Fig. Owe give the trends as a function of 
the number of hidden neurons, of the interquartile errors and robust dispersion obtained for 
the nearby and distant GG samples respectively. 
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Fig. 5. — Upper panel: GG sample, trend of the interquartile error and of the robust a as 
a function of the number N of the neurons in the hidden layer. The nearby and distant 
samples are plotted separately. Lower panel: the same as above for the LRG sample. 
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For the GG sample, the best experiment, the robust variance turned out to be 03 = 
0.0208 over the whole redshift range and 0.0197 and 0.0245 for the nearby and distant objects, 
respectively. For what the LRG sample is concerned, we obtained a 3 ~ 0.0163 over the whole 
range, and 03 ~ 0.0154 and 03 ~ 0.0189 for the nearby and distant samples, respectively. In 
the upper panels of Figs. [6] and [7] we plot the spectroscopic versus the photometric redshifts 
for the GG and the LRG samples, respectively. Due to the huge number of points which 
would make difficult to see the trends in the densest regions, we preferred to plot the data 
using isocontours (using a step of 0.02 times the maximum data point density). 
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Fig. 6. — Upper panel: photometric versus spectroscopic redrafts for the objects in the GG 
test set. The continuous lines are iso-density contours increasing with a step of 2 % of the 
maximum density. The crosses mark the average value of photometric redshifts in a specific 
spectroscopic redshift bin (see text), while the error bars give the robust variance <r 3 . Lower 
panel: same as above after the correction for the systematic trends via interpolation (see 
text). 
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Fig. 7.- 



Same as in Fig. [6] for the LRG sample. 
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The mean value of the residuals are —0.0036 and —0.0029 for the GG and the LRG samples, 
respectively. These figures alone, however, are not very significant since systematic trends 
are clearly present in the data as it is shown in Fig. [H] and in Fig. O where we plot for each 
0.05 redshift bin the average value of the photometric redshifts and the robust sigma of the 
residuals. 
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Fig. 8. — Histograms of residuals for the GG sample in slices of redshift. Upper panels: 
before the correction. Lower panels: after the correction. 
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9. — Same as in previous figure but for the LRG sample. 
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6. Interpolative correction 



The most significant deviations, as it could be expected ([Connolly et al.l Il995l ). are 
clearly visible in the nearby sample for z < 0.1 and in the distant sample at z ~ 0.4. The 
first feature is due to the fact that at low redshifts faint and nearby galaxies cannot be easily 
disentangled by luminous and more distant objects having the same color. The second one 
is instead due to a degeneracy in the SDSS photometric system introduced by a small gap 
between the g and r bands. At z ~ 0.4, the Ba lmer break falls into this gap and its position 



becomes ill defined ( jPadmanabhan et al.ll2005l ). 



It needs to be stressed, however, that these trends represent a rather normal behavio r for 
empirical methods w hich has already been explicitly noted in iTagliaferri et al.l (120021 ) and 
Vanzella et al.l (120041 ) and is clearly visibl e (even when it is not explicitly mentioned) in 
almost all photometric redshifts data sets (IWadadekarll2005l ) available so far for the SDSS. 



In order to minimize the effects of such systematic trends, but at the risk of a slight increase 
in the variance of the final catalogues we applied to both data sets an interpolative correction 
computed separatedly in the two redshift intervals. We used a (x 2 fitting) to find, separately 
in each redshift regime, the polynomials which best fit the average points. These polynomials 
(of the fourth and fifth order, respectively) turned out to be. For the GG sample: 



P 4 [0.005, 1.570, -12.577, 78.948, -157.961] 

P 5 [12.15, -178.2, 1039.3, -2959.0, 4135.5, -2271.3] 

and for the LRG sample: 

Pi [0.011, 0.885, -1.820, 21.350, -53.159] 

P 5 [13.1, -192.5, 1123.3, -3207.2, 4504.5, -2491.6] 



(4) 
(5) 



(6) 
(7) 



Thus, the correction to be applied is: 

Z phot = Z phot ~ ( z p hot ~~ Z spec) (8) 

where z^ t = Pi(z spec ) for near objects and z p ^ t = P^{z spe< ^) for the distant ones. 

Obviously, when applying this method to objects for which we do not possess any spectro- 
scopic estimate of redshift, it is impossible to perform the transformation (Eq. [S]) to correct 
NNs Zph ot estimates for systematic trends and we are obliged to use an approximation. In 
other words, we replace the unknown z spec with z p hot in the Eq. (jSj), obtaining the relation: 



Z phot ~ Z phot ( Z phot z phot) (9) 

where z p ^ t = Pi(z phot ) or z^ t = P 5 (z phot ) depending on the redshift range. 
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This is equivalent to assuming that the same NNs z p hot distribution represents, with good 
approximation, the underlying and unknown z spec distribution. After this correction we 
obtain a robust variance 03 = 0.0197 for the GG sample and 0.0164 for the LRG samples, 
computed in both cases over the whole redshift range, and the resulting distributions for the 
two samples are shown in the lower panels of Figs. [6] and UJ 



7. Discussion of the systematics and of the errors 



As noticed by several authors (see for instance. ISchneider et al.l (120061 ) ; IPadmanabhan et al 



( 120051 )). while some tolerance can be accepted on the amplitude of the redshift error, much 
more critical are the uncertainties about the pro bability distribution of those errors. This as- 
pect is crucial since ( IPadmanabhan et al.ll2005l ) the observed redshift distribution is related 
to the true redshift distribution via a Fredholm equation which is ill defined and strongly de- 
pendent on the accuracy with which the noise can be modeled. In this respect, many recent 
studies on the impact of redshift uncertainties on various cosmo l ogical aspects are av ailable: 
dark en ergy from supernoyae studies and cluster number counts ( Huterer et al.l 2004 ) ; weak 



lensing (IBernstein fc 



2006 



am 



2004 



Huterer et al.ll2006l : Ilshakll2005l : Ma et al.ll2006f ): baryon os- 



Zhan &: Knox! 120061 ). All these studies model the error distribution as 



dilations (IZhan 
Gaussian. 

However, photometric redshift error distributions, due to spectral-type/redshift degeneracies, 
often have bimod al distributions, with one smaller peak separate d from a larger peak by 
z of order unity (jBemtea |2000| ; iFernandez-Soto et al.l I200U |2002| ) , or more complex error 
distributions, as it can be seen in Fig. [S] within the GG sample. 
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Fig. 10. — Distribution of the residuals versus spectroscopic redshift after the correction 
for systematic trends. Upper panels: GG nearby and distant samples. Lower panels: LRG 
nearby and distant samples. The central line marks the average value of the residuals. The 
1 a and 2 a confidence levels are also shown. 
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In order to evaluate the robustness of the <7 r , several instances of the process were applied 
to different randomly selected training, validation and test sets and the robust sigma was 
found to vary only on the fourth significant digit. Small differences were found only in the 
identification of catastrophic objects, which however did not present any significant variation 
in their frequency. 

The distribution of the residuals as a function of the spectroscopic redshift for the GG 
and LRG samples is shown in Fig. [TD] separately for the near and distant objects. We have 
also studied the dependance of such residuals from the r-band luminosity of the galaxies in 
the two different magnitude ranges (cf. EJ), (r < 17.7 and r > 17.7) and in the near and 
intermediate redshift bins, as shown in Fig. [12] and Fig. [13] for the GG and LRG galaxies 
respectively. Clear systematics are found only for near /faint and intermediate/luminous 
LRGs residuals: in the former case, the mean value of residual z p hot — z spec is systematically 
higher then 0, while in the latter it is costantly biased to negative values. Both cases can be 
addressed reminding that these galaxies occupy a poorly sampled volume in the parameter- 
space, and therefore the NN fails to reproduce the exact trend of spectroscopic redshift. 

In Fig. [TT] we show the same plot as in Fig. but without isocontours and plotting 
as red dots the objects which "a posteriori" were labeled as members of the LRG sample. 
Interestingly enough, in the nearby sample the non-LRG and the LRG have robust variances 
of (T3 = 0.021 and 03 = 0.020. Notice, however, that the LRG objects show a clear residual 
systematic trend. This behaviour can be explained by the fact that in the nearby sample 
the training set contains a large enough number of examples for both samples of objects and 
the network can therefore achieve a good generalization capability. In the distant sample 
the Non-LRG and LRG objects have instead robust variances given by: 03 = 0.321 and 
o"3 = 0.021. Also in this case the observed behavior can be easily explained as due to the 
heavy bias toward the LRGs which form ~ 88.5% of the sample. It must be stressed that 
while the remaining 11.4% of the objects still constitute a fairly large sample of objects, the 
uneven distribution of the training data between the two groups of objects, overtrains the 
NN toward the LRG objects which therefore are much better traced. 



This confirms what already found by several authors (IPadmanabhan et al.ll2005l ): the 
derivation of photometric redshifts requires besides than an accurate evaluation of the errors 
also the identification of an homogeneous sample of objects. 
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Fig. 11. — Plot of the same data shown in the lower panel of Fig. El with the LRG objects 
marked as red dots. 
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Objects not matching the 3a criterium used for the robust variance are: 3.47% for the 
LRG sample and 3.18% for the GG sample. Before correction, the rejected points are ~ 2% 
of the overall distribution for the GG sample and ~ 1.8% for the LRG one. 

As it was already mentioned, the SDSS data set has been extensively analyzed by 
several authors who have used different methods for photometric redshift determination. 
Unfortunately, a direct comparison is not always possible due to differences in either the 
data sets (different data releases have been used) or in the way errors were estimated. It 
must be stressed, however, that due to the fact that, above a minimum and reasonably low 
treshold, the NN performances are not affected much by the number of objects in the training 
set, the former factor can b e safely neglected. So fa r, the most extensive works are those 
by (ICsabai et al.ll2003l ) and I Way & Sri vast aval (120061 ). In the former various methods were 
tested against the EDR data. With reference to their Table 3, and using the 'iterated' a which 
almost coincides with the robust variance adopted here, we find that the best performances 
were obtained, among the SED fitting methods for the BC synthetic spectra [an — 0.0621 
and an ~ 0.0306, for the GG and LRG samples respectively. This method, however leads 
to very clear systematic trends and to a large number of catastrophic outliers (~ 3.5%). 
Much better performances were attained by empirical methods and, in particular, by the 
interpolativ e one which leads to a a^ ~ 0.0273) with a fraction of catastrophic redshifts of 
only 2%. In I Way fc Sri vast aval (120061 ) the authors made use of an Ensemble of NN (E) and 
Gaussian Process Regressions (GP). Their best results using the magnitudes only were 0.0205 
and 0.0230 for the E and GP methods respectively, and at a difference with our method, 
their methods greatly benefits by the use of additional parameters such as the Petrosian 
radii, the concentration index and the shape parameter. 

Two points are worth to be stressed. First of all, their selection criteria for the construc- 
tion of the training set appear much more restrictive and it is not clear what performances 
could be achieved should such restriction be relaxed. Second, even though such 'ensemble' 
approach is very promising and is likely to be the most gen eral one, it has to be stressed 
that the bagging procedure, used in IWay fc Srivastaval (120061 ) to combine the NNs, is known 
to be very effective only in those cases where the intrinsic variance of the adopted machine 
learning model is high. In this specific case, large number of training data and few input 
features, the NN resu l t very stable and therefore ot her combining pr ocedures, such as Ad- 
aBoost (iFreund et al.l (120031 )). should be preferred (IDietterichl (120021 ) ) . This might also be 
the reason why when only the photometric parameters are used their method gives slightly 
worse performances than ours and instead leads to better results when the number of features 
is increased. 
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Fig. 12. — Distribution of residuals for the GG sample divided in magnitude bins. Upper 
left panel: nearby sample, r < 17.7; upper right panel: nearby sample, r > 17.7; Lower left 
panel: distant sample, r < 17.7; lower right panel: 17.7 < r. 




Fig. 13. — Distribution of residuals for the LRG sample divided in magnitude bins. Upper 
left panel: nearby sample, r < 17.7; upper right panel: nearby sample, 17.7 < r; Lower left 
panel: distant sample, r < 17.7; lower right panel: 17.7 < r. 
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An additi o nal m achine learning approach, namely Support Vector Machines, was used 
by IWadadekarl (120051 ). In Table H] we shortly summarize the main results of the above quoted 
papers. 



7.1. Contamination by distant galaxies 

The fact that our NN's are trained on a sample of galaxies with observed redshift 
z sp ec < 0.5 introduces some contamination from objects which even though at z > 0.5 still 
have r < 21 and therefore match the photometric selection criteria. 

The only possible way to avoid such an effect would be to use a knowledge base covering 
in an uniform way all significant regions of the photometric parameter space down to the 
adopted magnitude limit. In the case of SDSS this is true for magnitudes brighter than 
17.7 but is not true at fainter light levels where the only region uniformously covered by 
the spectroscopic subsample is that defined by the LRG selection criteria. A possible way 
out could be to extend the base of knowledge to fainter light level by including stastically 
significant and complete samples of spectroscopic redshifts from other and deeper surveys. 
The feasibility of using a third NN to classify (and eventually throw into a waste basket) 
objects having z > 0.5 is under study. At the moment, however, since we are interested in 
validating the method and in producing catalogues to be used for statistical applications, 
we shall estimate the number and the distribution in magnitude of such contaminants on 
statistical gro u nds o nly using the r-band luminosity function derived from SDSS data by 



Blanton et al.l (120031 ). This function, in fact, allows to derive for any given absolute magni- 
tude the number of objects which even though at a redshift larger than 0.5 still match our 
apparent magnitude threshold and thus are misclassified. By integrating over the absolute 
magnitude and over the volume covered by the survey we obtain the curve in Fig. [TH which 
corresponds to a total number of contaminants of ~ 3.74 x 10 6 . It has to be noticed however 
that for magnitudes brighter than 20.5, the fraction of contaminants is less than 0.04 and 
drops below 0.01 for r < 20. 
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Fig. 14. — Estimated distribution of contaminants as a function of the apparent r magnitude. 
The y-axis gives the expected fraction of objects at z > 0.5 which are erroneously evaluated 
by our procedure. 
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Table 4. Comparisons of various methods for the photometric redshift estimation applied 

to the SDSS data. 



Reference 


Method 


Data 


Az 


<7 


Range 


Csabai et al. (2003) 


SED fitting CWW 


EDR 




0.0621 




Csabai et al. (2003) 


SED fitting BC 


EDR 




0.0509 




Csabai et al. (2003) 


interpolative 


EDR 




0.0451 




Csabai et al. (2003) 


bayesian 


EDR 




0.0402 




Csabai et al. (2003) 


empirical, polynomial fit 


EDR 




0.0318 




Csabai et al. (2003) 


K-D tree 


EDR 




0.0254 




Suchkov et al. (2005) 


Class X 


DR-2 




0.0340 




Wav & Srivastava (2006 ) a 


Gaussian Process 


DR-3 




0.0230 




Wav & Srivastava (2006) a 


ensemble 


DR-3 




0.0205 




Collister & Lahav (2004) 


ANNz 


EDR 




0.0229 




Wadadekar (2005) 


SVM 


DR-2 




0.027 




Wadadekar (2005) a 


SVM 


DR-2 




0.024 




Vanzella et al. (2004) 


MLP ff 


DR1 


0.016 


0.022 


< 0.4 


Padrnanabhan et al. (2005) 


Template fitting and Hybrid 


DR1-LRG 


< 0.01 


~ 0.035 


< 0.55 


this work before int. 


MLP 


DR5-GG 


-0.0036 


0.0197 


0.01,0.25 


this work before int. 


MLP 


DR5-GG 


-0.0036 


0.0245 


0.25,0.48 


this work after int. 


MLP 


DR5-GG 






0.01,0.25 


this work after int. 


MLP 


DR5-GG 






0.25,0.48 


this work before int. 


MLP 


DR5-LRG 


-0.0029 


0.0194 


0.01,0.25 


this work before int. 


MLP 


DR5-LRG 


-0.0029 


0.0205 


0.25,0.48 


Note. — Column 1: reference; Column 2: method (for 


the acronyms 


see text); 


Column 


3: data set 



(EDR=Early Data Release; DR1 through DR5 the various SDSS data release); Column 4: systematic offset; 
Column 5: standard deviation; Column 6: redshift range over which the average error is estimated. 
( a ): additional morphological and photometric parameters. 
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8. The catalogues 

As mentioned above, the catalogues containing the photometric redshift parameters 
together with the parameters used for their derivation can be downloaded at the URL: 
http : //people. na.infn.it/ astroneural/SDSSredshifts.htm/. This data, for consistency with 
the SDSS survey, has been subdivided in several files, each corresponding to a different 
SDSS stripe of the observed sky. A stripe is defined by a line of constant survey latitude rj, 
bounded on the north and south by the edges of the two strips (scans along a constant 77 
value), and bounded on the east and west by lines of constant lambda. Because both strips 
and stripes are defined in "observed" space, they are rectangular areas which overlap as one 
approaches the poles (for more details see http : / /www. sdss.org). The data for both GG and 
LRG samples have been extracted using the queries described in HI The catalogues can be 
downloaded as 'FITS' files, containing the fundamental parameters used for redshift deter- 
mination and the estimated photometric redshift for each individual source. In more details 
(in brackets SDSS database names of the parameters): unique SDSS identifier ('obj ID'), right 
ascension J2000 ('ra') declination J2000 ('dec'), dereddened magnitudes ('dered_u', 'dered_g', 
'dered_r', 'dered_i', 'dered_z'), radius containing 50% of Petrosian flux for each magnitude 
('petroR50_u', 'petroR50_g', 'petroR50_r', 'petroR50J', 'petroR50_z'), radius containing 90% 
of Petrosian flux for each magnitude ('petroR90_u', 'petroR90_g', 'petroR90_r', 'petroR90_i', 
'petroR90_z'), De Vaucouleurs fit ln(likelihood) in u and r bands ('lnLDeV_u', 'lnLDeV_r'), 
exponential disc fit ln(likelihood) in u and r bands ('lnLExp_u', 'lnLExp_r'). 

9. Conclusions 

In the previous paragraphs we discussed a 'two steps' application of neural networks to 
the evaluation of photometric redshifts. Even though finely tailored on the characteristic of 
the SDSS, the method is completely general and can be easily applied to any other multiband 
data set provided that a suitable base of spectroscopic knowledge is available. As most other 
neural networks methods, several advantages are evident: 

1. The NN can be easily re-trained if new data become available. Even thoug the training 
phase can be rather demanding in terms of computing time, once the NN has been 
trained, the derivation of redshifts is almost immediate (10 7 objects are processed on 
the fly on a normal laptop). 

2. Even though it was not necessary in this specific case, all sorts of a priori knowledge 
can be taken into account. 
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On the other end, the method suffers of those limitations which are typical of all empirical 
methods based on interpolation. Most of all, the training set needs to ensure a complete and 
if possible uniform coverage of the parameter space. 

Our method allowed to derive photometric redshifts for z < 0.5 with robust variances 
of (T3 = 0.0208 for the GG sample (a 3 = 0.0197 and 03 = 0.0238 for the nearby and distant 
sample respectively) and 03 = 0.0164 for the LRG sample (a 3 = 0.0160 and u 3 = 0.0183). 
This accuracy was reached adopting using a two-step approach allowing to build training 
sets which uniformly sample the parameter space of the overall population. 

In the case of LRGs, the better accuracy and the close Gaussianity of the residuals, 
are explained by the fact that this sample was selected based on the a priori assumption 
that they form a rather homogeneous population sharing the same SED. In other words, 
this result confirms what has long been known, id est the fact that when using empirical 
methods, it is crucial to define photometrically homogeneous populations of objects. 

In the more general case it would be necessary to define photometrically homogeneous 
populations of objects in absence of a priori information and therefore relyi ng only on the 



photometric data them s elves. This task, as it has been shown for instance by lSuchkov et al. 



( 120051 ); iBazell &: Miller! (120041 ) is a non trivial one, since the complexity of astronomical data 
and the level of degeneration is so high that most unsupervised clustering methods partition 
the photometric parameter space in far too many clusters, thus preventing the build-up a 
of a suitable base of knowledge. A possible way to solve this problem will be discussed in 
Paper III. 
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